[CS2] Fix handling of tabbed code blocks in .litcoffee files #4485

GeoffreyBooth · 2017-04-05T07:32:52Z

So since #4345, code blocks in .litcoffee files that were indented with tabs were simply skipped over, and the code never executed. That’s why the tests were “passing,” because they were never running. Oops!

I wanted the tests to be a bit more thorough, so I duplicated the literate.litcoffee file to a version where all the code blocks are indented with tabs. Now not only are we testing that tabbed code blocks work, but that they work in all the same scenarios as spaced code blocks.

After a lot of investigation, and a detour in trying to implement this with Illiterate (see branch) I think the root cause is that our Markdown parsing library Marked just doesn’t handle tabbed code blocks properly. @billymoon was aware of this in his initial implementation, which had a magic token substituting for tabs, but it was buggy (for one thing, it was only replacing the first tab in the file, not all tabs or all leading tabs). I realized that since this input is just getting turned into JavaScript anyway, we might as well replace the leading tabs with spaces before Marked does its parsing, to sidestep Marked‘s bug. This seems to have solved it.

I went a little further to try to preserve correct line numbers for stack traces, which have also been completely blown away since we switched to using Marked. I pretty much have it fixed if you don’t do your own word wrapping for list items, which of course we’re doing in our tests. (Maybe we shouldn’t; I doubt it’s common for people to manually word-wrap their Markdown.) Anyway improving it further could be the subject of another PR if anyone is interested.

…parsing of tabbed .litcoffee files; and more accurate stack traces (assuming you don’t do your own word wrapping within list items)

lydell · 2017-04-05T07:50:37Z

src/helpers.coffee

+ # `marked` really doesn’t process tabs too well. All this code is headed
+ # into JavaScript anyway, so replace the tabs with two spaces to make
+ # `marked`’s job a little easier.
+ code = code.replace /^(\t)+/gm, (match) ->


What about multiline strings and regexes?

code here is the source code of an entire file. This function is intending to replace all leading tabs (between the start of any line and the first non-tab character) with 2 spaces × the number of leading tabs on that line. Does it not do that?

I’m worried that it changes the meaning of mulitline strings and regexes.

Here’s a multiline string: myString = """ line one line two """ eq myString, "line one\n\tline two"

Perhaps only replacing the first tab on each line is enough? code.replace /^\t/gm, ' '

That was the previous approach, replacing the first tab with a tab plus a magic token. Then after the parsing by Marked, the tab plus token were replaced back with a tab. For some reason I was getting lots of compiler errors regarding indentation when I did that, that I couldn't figure out. I'll look into it some more.

lydell · 2017-04-05T07:53:46Z

I doubt it’s common for people to manually word-wrap their Markdown.

Many text editors have a function to split long lines and insert newlines to make all lines fit within, say, 80 columns. In my experience, I’ve seen just as much “soft wrapped” as “hard wrapped” markdown in the wild.

GeoffreyBooth · 2017-04-05T08:11:41Z

Throw a console.log out before the last line of exports.invertLiterate and you can see what gets produced by Marked and our additional processing. It’s quite close to the original. Parsing literate.litcoffee produces a parsed out that’s 175 lines long, from an original that is 163. The extra lines come from newlines added between the lines of the list items (the Latin paragraphs).

Parsing scope.litcoffee produces an out that is 118 lines long, exactly the same as the original. sourcemap.litcoffee comes in at 166 to the original’s 177, because the original occasionally uses two blank lines between sections. Basically if you’re careful to follow a few rules, you can get your stack traces to match up perfectly. I’m not sure how we would handle these edge cases, but again, I think that’s a separate PR.

carlsmith · 2017-04-05T13:11:07Z

Just an aside: Almost every style guide for every language recommends spaces over tabs these days. Officially, Python only tolerates tabs to support existing codebases.

There's an opportunity now to mandate spaces. It's obviously going to upset some people, but it would simplify things if the new spec simply disallowed tabs in source code entirely.

GeoffreyBooth · 2017-04-05T15:25:16Z

@carlsmith That would be a breaking change with no upside for users. Sure it would make our lives easier, but supporting tabs isn't so hard. The compiler shouldn't also be a linter.

GeoffreyBooth · 2017-04-05T15:33:02Z

How bad would it be to return to the v1 implementation? What concrete bugs does using Marked solve? At the very least it makes stack traces difficult to keep correct. @billymoon

carlsmith · 2017-04-05T15:39:20Z

Most languages are trying to phase out the use of tabs, but if it's a non-starter, that's cool. It was worth a shot.

Thanks for everything you've been doing @GeoffreyBooth. Appreciate it.

lydell · 2017-04-05T17:12:02Z

Yet an approach would be to switch to another markdown parser, such as markdown-it.

billymoon · 2017-04-05T18:06:56Z

@GeoffreyBooth I am actually in favour of removing support for literate coffee from the project, and creating a more general project to handle any file-type, which is what I was experimenting with a POC for (https://github.com/billymoon/illiterate). The new workflow would be something like...

illiterate something.coffee.md | coffee

... and for other files ...

illiterate something.js.md | babel

... etc

Ultimately, I really like literate style, and want to be able to use it on all my code, not just coffee-script. Actually, I remember when adding a -l flag for supporting piped literate in coffeescript, we dropped linting feature to make space for literate with the rationale being that linting belongs in an external tool. I think literate is the same.

carlsmith · 2017-04-05T18:34:05Z

Just a thought: If the CoffeeScript compiler allows scripts to import other CoffeeScript modules, wouldn't it need to understand literate files? I may be wrong. I'm not familiar with how imports work, but in interpreted languages, the interpreter would need to understand literate syntax to import those files.

lydell · 2017-04-05T18:36:20Z

CoffeeScript doesn’t care what you try to import.

GeoffreyBooth · 2017-04-05T18:41:02Z

@billymoon take a look at my illiterate branch. I agree that that’s a good approach long-term, but it would be a pretty major change to CoffeeScript at the moment; not least because part of the compiler itself is written in .litcoffee.

One big problem that needs solving is source maps, or rather how to preserve accurate line numbers in stack traces. Check out that branch and introduce a syntax error near the end of literate.litcoffee or literate_tabbed.litcoffee and you’ll see how useless the error message is.

Also it really should understand tabs properly. Regardless of whatever movement I may be unaware of to get rid of tabs across the world, they make up substantial percentages of repos on GitHub, and I would only expect that to increase as more people realize there are no technical limitations anymore to working with tabs and source control. Tabs provide the benefit that different members of a team can choose their own indentation level (i.e. tell their editor to render a tab as being 2 spaces or 4) and that has really kept the peace on many of the teams I manage. I wouldn’t want to lose that option.

lydell · 2017-04-05T19:16:42Z

Please let’s not discuss tabs vs spaces. Markdown supports both spaces and tabs for code blocks, so we have to as well. Period. :)

https://daringfireball.net/projects/markdown/syntax#precode
http://spec.commonmark.org/0.27/#tabs

carlsmith · 2017-04-05T21:07:43Z

@GeoffreyBooth - I never said anything about any movements across the world. I said...

Almost every style guide for every language recommends spaces over tabs these days. Officially, Python only tolerates tabs to support existing codebases.

...which is objectively true, these days. Tons of code exists that was written with tabs, which is why they're a bitch to phase out, but the general consensus, at least amongst the authors of official style guides, is that we should stop using them.

The Python Style Guide says don't use tabs. Same for Ruby, PHP and Java, as well as Google's style guides for CSS, C++ and JavaScript. They're the first examples I could find yesterday during a similar conversation. I just looked at Google's Markdown Style Guide as well, and it says "use four spaces".

You made it clear that you aren't interested, which I happily accepted, and thanked you for your hard work:

Most languages are trying to phase out the use of tabs, but if it's a non-starter, that's cool. It was worth a shot.
Thanks for everything you've been doing @GeoffreyBooth. Appreciate it.

I don't understand why you mocked me for that, but it doesn't really matter. If you think tabs are better than spaces, and that they're going to grow in popularity (despite the style guides), then you should obviously support them. There's no point countering your argument for tabs over spaces, as we've all heard both sides of that debate a thousand times.

GeoffreyBooth · 2017-04-05T21:24:04Z

@carlsmith My apologies if I came across as mocking, that wasn’t my intent. And I agree with @lydell, I don’t want yet another flame war about tabs vs spaces. I was trying to avoid taking a side; my point was just that tabs are still quite popular, despite style guides advising against them, and so we need to support them. I doubt many of those projects are just legacy codebases that haven’t been updated to the latest styles. Spaces have been most languages’ recommended style for a long, long time.

carlsmith · 2017-04-05T21:44:51Z

@GeoffreyBooth - No problem. To be fair, I'm tired, and probably shouldn't have let your comment bother me as much as I did. Sorry mate. I have these days sometimes, and can only apologise.

I understand what you're saying about tabs, and it does make sense. I'm a four-spaces guy, so there's a good chance I'm guilty of an unconscious bias here.

The thing is, when you don't really contribute to a project, but care about it a lot, you end up in a tricky situation. On one hand, you always want to chip in with your own perspective, but on the other hand, you don't want to be the ideas-guy that argues for a particular position, but then expects someone else to implement it.

GeoffreyBooth · 2017-04-05T22:15:21Z

@carlsmith And I bet it irks you to read the CoffeeScript codebase with its 2-space indentation, doesn’t it? If only it had been written in tabs, that you could choose to display as 4 spaces . . . 😛

I wouldn’t say you don’t contribute to the project. You’ve written a lot in many of the issue threads, and your comments have been valuable and helped guide the discussion and consensus. That in and of itself is a positive contribution, and I appreciate it.

At work I manage teams of developers. I’ve had to mediate this dispute way too many times. I’ve adopted a policy of letting each team decide for themselves per language what they want. Most people—certainly not everyone, but more than half—prefer spaces, and often they can agree on either 2 spaces or 4 for indentation; but sometimes there are fierce disagreements over 2 spaces versus 4. When that happens the solution is to compromise on tabs, so that each camp can see the code the way they prefer. I once had someone insist that even that wasn’t good enough, that she needed not just 4-space indentation but 4 actual space characters as well, and I wrote her some Sublime Text macros to auto-format her code to 4-spaces on file open and back to tabs on save. That’s pretty extreme, but it gets to the larger point that we’re not in the 1970s or 1990s anymore: we’re not constrained by technical limitations that force one way or another, so each team member should get to see their own preference, just like each developer gets to choose their own editor or syntax highlighting. There’s nothing wrong with tabs, like there’s nothing wrong with 2 spaces or 4. They’re all reasonable choices, and we shouldn’t make people preprocess their code to our standard before the compiler will accept it.

I don’t know about Python, but Go is going in the opposite direction: http://stackoverflow.com/a/19094725/223225

carlsmith · 2017-04-05T22:55:38Z

And I bet it irks you to read the CoffeeScript codebase with its 2-space indentation, doesn’t it? If only it had been written in tabs, that you could choose to display as 4 spaces . . . 😛

You've got me there :)

I wouldn’t say you don’t contribute to the project. You’ve written a lot in many of the issue threads, and your comments have been valuable and helped guide the discussion and consensus. That in and of itself is a positive contribution, and I appreciate it.

Thank you. That's very kind of you.

The Go approach is really interesting.

carlsmith · 2017-04-05T23:04:29Z

@GeoffreyBooth - I hope your team appreciate how well you deal with people. People are difficult. You have the qualities a good lead should have, but all too often lack. Thanks again mate, and sorry for taking things the wrong way.

billymoon · 2017-04-06T00:38:44Z

@lydell thanks for the tip, I updated illiterate to use markdown-it

@GeoffreyBooth I broke the back of adding source maps to illiterate, currently as inline data uri string. I need to think about the whole cli command structure to allow for optional source maps and on a per file basis, and how to deal with input from stdio, but the source map writing part is done (mapping full source line to destination line, disregarding columns, which are going to be identical in this case). Does that help at all with your experimental illiterate branch?

GeoffreyBooth · 2017-04-06T01:39:25Z

@billymoon I’m not sure how I would pass along and process the sourcemaps generated by Illiterate in the main CoffeeScript parser. The invertLiterate helper just returns a string. Certainly we can update it to return an object and the main compiler can pull out the source and the sourcemap, but then we’d have to merge this sourcemap with CoffeeScript’s sourcemap at the end of compilation. Do you want to take a crack at that?

I at least figured out why the “replace tabs with a magic token” approach was failing: Marked itself replaces tabs with spaces! See:

node -e 'console.log(require("marked").lexer("\t\tfoo();\t\t\tbar();", {}))'
[ { type: 'code', text: '    foo();            bar();' },
  links: {} ]

…eScript files; use MarkdownIt’s `map` property to preserve correct line numbers

…tarting indentation

…s must be unindented

…ed a blank line separating them from the list item text

GeoffreyBooth · 2017-04-06T03:14:30Z

Okay, I think I’ve got it! Including correct line numbers in stack traces!

The function is so short now I’ll just paste it here:

exports.invertLiterate = (code) ->
  out = []
  md.renderer.rules =
    code_block: (tokens, idx) ->
      startLine = tokens[idx].map[0]
      lines = tokens[idx].content.split '\n'
      for line, i in lines
        out[startLine + i] = line
  md.render code
  out.join '\n'

Yes it could squeeze even shorter, but I’m not trying to golf here. Anyway let’s unpack this. First, @lydell’s suggestion of MarkdownIt (md above) is a winner! That library doesn’t convert tabs into spaces, so there’s no tab substitution/reparation necessary. I added a new test to both the spaced and tabbed Literate CoffeeScript tests to confirm that block strings are unaffected.

Second, @billymoon’s commit had the brilliant idea of using MarkdownIt’s rules engine for pulling out the code blocks. Much easier than my first idea, which was to use its parse method.

I inspected what the code_block function was getting, and the tokens[idx] object not only contained a content property with the code source, but also a map property that’s an array like [ 70, 72 ] telling us the first and last line of the original source that this token covers. This is perfect! Now we can line up every code block token with its correct line number! So that’s what I’m doing here: splitting each code block into an array of lines, then assigning those lines to out using the line number of each line of code as out’s array index. Then when we join out back together into a string separated by newlines, all the lines of code are have the same line numbers as they did in the original .litcoffee file. Stack traces for syntax errors in Literate CoffeeScript files now always show the correct line content and number.

There is one caveat: the new library introduces a two minor breaking changes to Literate CoffeeScript:

Code blocks within HTML blocks must now be unindented. b8eed42. In other words, if you want a code block within an HTML paragraph, it can’t be entirely indented. At least one line of the block must be at the first column, i.e. unindented.
Code blocks within lists need a blank line separating them from the list item text. 7705438. It’s no longer enough to just indent a code block adjacent to a list item. You need a blank line first.

I can live with these changes. I had to modify two tests to accommodate them, but nothing in scope.litcoffee or sourcemap.litcoffee needed to change. We can note these in the docs, and I don’t think they’d cause too much of an outcry.

billymoon · 2017-04-06T10:10:29Z

@GeoffreyBooth just checking, is it still desirable to externalise literate feature and pass source maps from illiterate? I can have a bash at that if it's useful.

p.s. you know the code is right when there are only 10 lines :)

GeoffreyBooth · 2017-04-06T16:58:58Z

@billymoon Yes and no 😄 I think at this point people expect the CoffeeScript compiler to process .litcoffee files, and it would be a breaking change if it didn’t, so coffee test.litcoffee needs to keep working no matter what. I would be happy to replace my 10 lines with a call to Illiterate, if it wants to return the same string that out currently gets returned as. Though that would mean that Illiterate essentially just becomes this function, which may or may not be what you want.

There’s actually no need for source maps using the final implementation I have now, as every line of code is at the same row and column as the original Markdown; Illiterate could take the same approach, which might simplify things.

billymoon · 2017-04-06T17:10:37Z

@GeoffreyBooth well, if that's the case, I don't really see much use in putting illiterate in. Keep up the good work :)

GeoffreyBooth · 2017-04-06T18:24:24Z

@billymoon sorry. I support what you’re doing though, I think you’re right and Illiterate is the way Literate CoffeeScript should’ve been implemented in the first place. If you want to propose a way to replace our current hodgepodge with Illiterate, and it wouldn’t be a breaking change, I would heartily support that.

GeoffreyBooth added 2 commits April 4, 2017 23:29

Add tabbed literate test; modernize Markdown title heading

6a31359

Better parsing of Literate CoffeeScript files, including now correct …

17c7333

…parsing of tabbed .litcoffee files; and more accurate stack traces (assuming you don’t do your own word wrapping within list items)

GeoffreyBooth added this to the 2.0.0 milestone Apr 5, 2017

lydell reviewed Apr 5, 2017

View reviewed changes

GeoffreyBooth added 6 commits April 5, 2017 19:32

Swap Marked for MarkdownIt for parsing the Markdown of Literate Coffe…

d9e1fd4

…eScript files; use MarkdownIt’s `map` property to preserve correct line numbers

Literate CoffeeScript tests: remove trailing whitespace, fix spelling

3585d2a

Literate CoffeeScript tests: add block quote test

c166486

Literate CoffeeScript (tabbed, at least) seems to need a consistent s…

a3d6f27

…tarting indentation

Restore test

5826ca9

Reference links work now in MarkdownIt

b401f84

GeoffreyBooth added 2 commits April 5, 2017 19:48

Breaking change in Literate CoffeeScript: code blocks within HTML tag…

b8eed42

…s must be unindented

Breaking change in Literate CoffeeScript: code blocks within lists ne…

7705438

…ed a blank line separating them from the list item text

lydell approved these changes Apr 6, 2017

View reviewed changes

GeoffreyBooth merged commit 5e1d978 into jashkenas:2 Apr 6, 2017

GeoffreyBooth deleted the litcoffee-bug branch April 6, 2017 17:09

GeoffreyBooth mentioned this pull request Apr 9, 2017

[CS2] 2.0.0-beta1 docs #4494

Merged

GeoffreyBooth added the fixed label Apr 9, 2017

This was referenced Apr 13, 2017

Fixes to support CS2 clutchski/coffeelint#596

Merged

CS2 Discussion: Project: Progress as of April 2017: 2.0.0-beta1! coffeescript6/discuss#80

Closed

coffeescriptbot mentioned this pull request Feb 19, 2018

CS2 Discussion: Project: Progress as of April 2017: 2.0.0-beta1! #4973

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CS2] Fix handling of tabbed code blocks in .litcoffee files #4485

[CS2] Fix handling of tabbed code blocks in .litcoffee files #4485

GeoffreyBooth commented Apr 5, 2017

lydell Apr 5, 2017

GeoffreyBooth Apr 5, 2017

lydell Apr 5, 2017 •

edited

Loading

lydell Apr 5, 2017 •

edited

Loading

GeoffreyBooth Apr 5, 2017

lydell commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017

lydell commented Apr 5, 2017

billymoon commented Apr 5, 2017 •

edited

Loading

carlsmith commented Apr 5, 2017

lydell commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

lydell commented Apr 5, 2017 •

edited

Loading

carlsmith commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017 •

edited

Loading

carlsmith commented Apr 5, 2017

billymoon commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

billymoon commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

billymoon commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

[CS2] Fix handling of tabbed code blocks in .litcoffee files #4485

[CS2] Fix handling of tabbed code blocks in .litcoffee files #4485

Conversation

GeoffreyBooth commented Apr 5, 2017

lydell Apr 5, 2017

Choose a reason for hiding this comment

GeoffreyBooth Apr 5, 2017

Choose a reason for hiding this comment

lydell Apr 5, 2017 • edited Loading

Choose a reason for hiding this comment

lydell Apr 5, 2017 • edited Loading

Choose a reason for hiding this comment

GeoffreyBooth Apr 5, 2017

Choose a reason for hiding this comment

lydell commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017

lydell commented Apr 5, 2017

billymoon commented Apr 5, 2017 • edited Loading

carlsmith commented Apr 5, 2017

lydell commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

lydell commented Apr 5, 2017 • edited Loading

carlsmith commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017

GeoffreyBooth commented Apr 5, 2017

carlsmith commented Apr 5, 2017 • edited Loading

carlsmith commented Apr 5, 2017

billymoon commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

billymoon commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

billymoon commented Apr 6, 2017

GeoffreyBooth commented Apr 6, 2017

lydell Apr 5, 2017 •

edited

Loading

lydell Apr 5, 2017 •

edited

Loading

billymoon commented Apr 5, 2017 •

edited

Loading

lydell commented Apr 5, 2017 •

edited

Loading

carlsmith commented Apr 5, 2017 •

edited

Loading